reduced inner product note
Scaled Dot-Product Attention
with reduction internal volume caution.
Scaling in the opposite direction to the soft-argmax approximation, scaling in the direction of decreasing before Softmax
In short, the act of making soft caution softer.
---
This page is auto-translated from /nishio/縮小付き内積注意 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.